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The nonstructural protein 1 (nsp1) of the severe acute respiratory syndrome coronavirus has 179 residues 
and is the N-terminal cleavage product of the viral replicase polyprotein that mediates RNA replication and 
processing. The specific function of nsp1 is not known. Here we report the nuclear magnetic resonance 
structure of the nsp1 segment from residue 13 to 128, which represents a novel o/f-fold formed by a mixed 
parallel/antiparallel six-stranded {$-barrel, an a-helix covering one opening of the barrel, and a 3, -helix 
alongside the barrel. We further characterized the full-length 179-residue protein and show that the polypep- 
tide segments of residues 1 to 12 and 129 to 179 are flexibly disordered. The structure is analyzed in a search 
for possible correlations with the recently reported activity of nsp1 in the degradation of mRNA. 


After the major outbreak of severe acute respiratory syn- 
drome (SARS) in the beginning of 2003, the SARS coronavirus 
(SARS-CoV) became a major topic of coronavirus research. 
The coronavirus genome is composed of a single plus-strand 
RNA of about 30 kb, which is the largest nonsegmented ge- 
nome among known RNA viruses. About two-thirds of the 
coronavirus genome is devoted to encoding the replicase that 
mediates viral RNA synthesis (64). The replicase gene com- 
prises two large open reading frames (ORFs) located at the 5’ 
end of the genome. The first one, ORFla, encodes a polypro- 
tein of 450 to 500 kDa (polyprotein 1a), and the second one, 
ORFIb, is translated together with ORFla after a —1 ribo- 
somal frameshift, leading to the expression of the “polyprotein 
lab,” which has a size of 750 to 800 kDa. The replicase polypro- 
tein is processed by ORFla-encoded viral proteinases, which 
leads to about 16 nonstructural proteins (nsp), which are num- 
bered consecutively from the N terminus to the C terminus of 
the polyprotein (14, 53). 

The large number of mature proteins produced from the 
polyprotein indicates a high level of complexity of the viral 
replication process. Some of the enzymatic activities that were 
detected or predicted in SARS-CoV to date include the main 
protease (nsp5), a papain-like proteinase (nsp3d; PLpro), an 
RNA-dependent RNA polymerase (nsp12), an RNA helicase 
(nsp13), an endoribonuclease (nsp15), an ADP-ribose-1”-phos- 
phatase (nsp3b), a deubiquitinase (nsp3d), a 3’—>5’ exoribo- 
nuclease (nsp14), and a ribose-2’-O-methyltransferase (nsp16) 
(3, 5, 18, 21, 22, 44, 59, 70). For the two proteases, the endori- 
bonuclease, and the ADP-ribose-1”-phosphatase, three-dimen- 
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sional structures have been solved, which together with bio- 
chemical data have revealed some aspects of the enzyme 
mechanisms (27, 54, 56, 58, 63, 67; reviewed in references 4, 35, 
and 65). The physiological functions of several of the other 
nonstructural proteins remain to be determined, but high-res- 
olution structure determinations have allowed the identifica- 
tion of possible functional sites and provided the basis for 
further biochemical studies (15, 30, 52, 61, 62, 68). Other 
replicase proteins still remain to be characterized. 

Nspl is the N-terminal cleavage product of the replicase 
polyprotein and is produced by the action of PLpro. It is 
among the least well-understood nsps, and other than in coro- 
naviruses, no viral or cellular homologs are known. Levels of 
sequence conservation among the different coronaviruses are 
highest at the 3’ end of the genome, and the sequences are very 
divergent at the 5’ end, especially in nsp1 to nsp3, which are 
products of PLpro cleavage. nspl has been proposed to be 
useful as a group-specific marker (59). In the group 1 corona- 
viruses, nsp1 (also known as p9) is a protein of about 110 
residues, with 20 to 50% sequence identity among all group 1 
CoVs. The viruses of subgroup 2a, such as murine hepatitis 
virus (MHV) and human coronavirus OC43, encode an nsp1 
protein of about 245 residues, also known as p28, while the 
group 3 viruses (avian) do not encode an nsp1l. The nsp1 of 
SARS-CoV, which has been classified as the only member to 
date of the subgroup 2b (19, 20, 59), comprises 180 residues, 
with a molecular mass of 20 kDa. nsp1 sequences are divergent 
between groups 2a and 2b, and no sequence similarity between 
SARS-CoV nsp1 and group 2a nsp1 proteins could be identi- 
fied using standard searching tools such as BLAST. 

Biochemical experiments demonstrated interactions be- 
tween MHV nsp1 and two other replication proteins (nsp7 and 
nsp10) and colocalization with nonstructural proteins and the 
nucleocapsid protein at viral replication complexes in the cy- 
toplasm during the early stages of infection (6). In contrast, 
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during the later stages of infection, MHV nsp1 was found to 
colocalize with structural proteins at virion assembly sites (6). 
Mutations at the nsp1/nsp2 cleavage site of MHV that pre- 
vented the cleavage of nsp1 from the polyprotein caused 
slower growth and reduced RNA synthesis relative to wild-type 
viruses (13). Deletion of the nsp1-coding region in infectious 
clones of MHV yielded viruses that were unable to produc- 
tively infect cultured cells (7). Furthermore, exogenous expres- 
sion of MHV nsp1 in mammalian cells arrested the cell cycle in 
the G,/G, phase and inhibited cell proliferation (8). A point 
mutation in the proteolytic cleavage site between nsp1 and 
nsp2 in the full-length genome, and in minigenomes of the 
group 1 CoV porcine transmissible gastroenteritis virus, blocked 
the release of nsp1 from the nascent polyprotein and caused 
a dramatic reduction in virus viability (17). SARS-CoV nsp1 
was shown to specifically accelerate the degradation of mRNA 
and thus lead to a reduction in cellular protein synthesis, 
which may provide a survival advantage for the virus (31). 
Overall, these observations indicate that nsp1 might partic- 
ipate in multiple stages of the coronavirus life cycle, and 
they implicate this protein as a potentially important viru- 
lence factor. 

This paper describes the nuclear magnetic resonance (NMR) 
structure of SARS-CoV nsp1. Before this study, no infor- 
mation about the three-dimensional structure of nsp1 was 
available, and SARS-CoV nsp1 does not have significant 
amino acid sequence similarity with any protein with known 
three-dimensional structure. SARS-CoV nsp1 was therefore 
selected for NMR structure determination by the Consor- 
tium for Functional and Structural Proteomics of SARS- 
CoV-Related Proteins (http://sars.scripps.edu). The avail- 
ability of a high-resolution solution structure will help to 
guide further investigations of the biochemical and physio- 
logical functions of nsp1. 


MATERIALS AND METHODS 


Target optimization strategy. Six constructs truncated at different positions 
were created based on secondary structure prediction, using the amino acid 
sequence as input for the software Jpred (12). The truncated nsp1 variants were 
cloned in an Escherichia coli expression plasmid derived from pET-28 under the 
control of the T7 promoter and in frame with 5’ coding sequences for a His, tag 
and followed by a spacer sequence which ends with ENLYFQG. This strategy 
allows the preparation of proteins that have only one extra glycine at the N 
terminus after proteolysis with tobacco etch virus protease. This expression 
strategy was selected after tests of different E. coli strains and different temper- 
atures during the induction in 10-ml cultures. The samples were expressed in a 
microexpression device (M. S. Almeida, M. Geralt, R. Horst, and K. Wiithrich, 
unpublished), purified using Ni?* affinity chromatography, concentrated with 
ultrafiltration centrifugal devices, and subjected to one-dimensional (1D) 'H 
NMR screening using a Bruker DRX700 spectrometer with a 1-mm TXI HCN 
z-gradient microprobe. Based on the high-quality 1D 1H NMR spectrum, the 
construct consisting of nsp1 residues 13 to 128 [nsp1(13-128)] was selected for a 
NMR structure determination. In an attempt to further improve this sample, 
variant constructs of nsp1(13-128) with Cys 52 replaced by Ala, Ser, Arg, or Asp 
were prepared by site-directed mutagenesis using the QuikChange kit (Strat- 
agene) according to the manufacturer’s instructions. The variants were evaluated 
by 1D 'H NMR screening for a globular fold and by circular dichroism spec- 
troscopy to determine their stability. 

Protein preparation. Large-scale expression of uniformly !°N-labeled or 13C- 
and }°N-labeled nsp1(13-128) in E. coli BL21(DE3) cells was carried out at 18°C 
in 500 ml of M9 minimal medium containing either 0.5 g NH,Cl or 0.5 g 
'SNH,Cl and 2 g ['C,]-p-glucose as the sole nitrogen and carbon sources, 
respectively. For the protein purification, the cells were disrupted by sonication 
in the presence of 25 mM HEPES at pH 8.0, 250 mM NaCl, 2 mM dithiothreitol, 
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0.03% NaN3, and EDTA-free Complete protease inhibitor tablets (Roche). The 
cell lysate was loaded onto a 10-ml HisTrap FF column equilibrated with 50 mM 
imidazole in the same buffer system as mentioned above. The retained proteins 
were eluted with a 50 to 500 mM imidazole gradient and incubated with recom- 
binant tobacco etch virus protease at 22°C for 2 days. The resulting solution was 
loaded onto a 300-ml Superdex 75 column equilibrated with 25 mM sodium 
phosphate at pH 7.0, 250 mM NaCl, and 0.03% NaN3. The protein eluted with 
a retention volume equivalent to about 13 kDa. The solution was concentrated 
with ultrafiltration centrifugal devices and supplemented with 10% D,O to a final 
sample volume of about 300 pl. 

NMR spectroscopy and structure calculation. The NMR samples contained 2 
mM of nsp1(13—128). NMR spectra were collected at 298 K with Bruker Avance 
600-MHz and Avance 800-MHz spectrometers equipped with TXI HCN z- 
gradient probes. The sequence-specific resonance assignment (66) has been 
described elsewhere (1). The input for the structure calculation consisted of the 
chemical shift list obtained from the resonance assignment, a 3D '°N-resolved 
1H,'H nuclear Overhauser effect spectroscopy (NOESY) spectrum, and two 3D 
3C-resolved ‘H,'H NOESY spectra optimized for the aliphatic and aromatic 
3C regions. The nuclear Overhauser effect (NOE) data were measured at 800 
MHz with a mixing time of 60 ms. For the peak picking of the NOESY spectra, 
NOE assignment, and structure calculation, the stand-alone ATNOS/CANDID 
program (24, 25) was used in conjunction with the CYANA torsion angle dy- 
namics algorithm (23). The standard protocol with seven cycles of peak picking, 
NOE assignment, and 3D structure calculation with simulated annealing in 
torsion angle space (24, 25) was applied. Backbone » and y dihedral angle 
constraints derived from the C* chemical shifts (40, 60) were used as supple- 
mentary data in the structure calculation. The 20 conformers with the lowest 
residual CYANA target function values obtained from cycle 7 of the ATNOS/ 
CANDID/CYANA calculation were energy minimized in a water shell with the 
program OPALp (34, 39), using the AMBER force field (9). The program 
MOLMOL (33) was used to analyze the protein structure and to prepare the 
figures showing the NMR structures. Analysis of the stereochemical quality of 
the models was accomplished using the Joint Center for Structural Genomics 
validation central suite (http://www.jcsg.org) and the Protein Data Bank valida- 
tion server (http://deposit.pdb.org/validate). 

Steady-state SN{‘H} NOEs were measured with transverse relaxation-opti- 
mized spectroscopy (TROSY)-based experiments (55, 69) on a Bruker Avance 
600-MHz spectrometer, using a saturation period of 3 s and an interscan 
delay of 5 s. 

Accession numbers. The chemical shifts have been deposited in the Bio- 
MagResBank (http://www.bmrb.wisc.edu) under accession number 7014. The 
atomic coordinates of the bundle of 20 conformers used to represent the nsp1 
structure have been deposited in the Protein Data Bank (http://www.rcsb.org 
/pdb) with the code 2GDT, and those of the conformer closest to the mean 
coordinates have the code 2HSX. 


RESULTS AND DISCUSSION 


The 179-residue nsp1 of SARS-CoV was included in a high- 
throughput proteomics characterization strategy by the consor- 
tium “Functional and Structural Proteomics of the SARS- 
CoV” (unpublished). The full-length protein and the fragment 
consisting of residues 1 to 159 were cloned, expressed and 
purified by the Protein Production Core, and given to us for 
NMR screening (50). The 1D 'H NMR spectra of both con- 
structs showed characteristics of a globular fold as well as of 
disordered regions (data not shown). Based on these results, 
the protein was transferred to us for an NMR structure deter- 
mination. 

Since nsp1 has no identifiable sequence similarity with pro- 
teins with known three-dimensional structures, it was not pos- 
sible to predict the domain structure of this protein based on 
sequence comparisons. However, the presence of flexibly dis- 
ordered regions in the protein identified by 'H NMR spectros- 
copy (see “Characterization of the full-length SARS-CoV 
nsp1” below) was consistent with the results of secondary struc- 
ture prediction, which indicated that a few residues at the N 
terminus, as well as a greater number of residues in the C- 
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TABLE 1. Summary of the recombinant production of nsp1 
variants in BL21(DE3) E. coli cells 


Soluble expression? 


Molecular : Protein Expression 

Name mass* ee recovery level 

(kDa) 370 27PC sec ms) (wM)° 
nsp1 19.5 —/+ -/+ + 3 15 
nsp1(13-179) 18.3 —/+  -/+ ++ 2 11 
nsp1(1-149) 16.1 —/+  -/+ ++ 3 19 
nsp1(13-149) 15.0 [+/+ ++ 2 13 
nsp1(1-128) 13.8 —/+ + ++ 4 29 
nsp1(13-128) 12.7 —/+ + ++ 3 24 


* Without the expression tag of 3,500 Da (see text). 

» The soluble expression levels were evaluated by denaturing polyacrylamide 
gel electrophoresis of the total cell lysate and of the soluble fraction thereof. 
—/+, less than 25% soluble protein; +, about 50% soluble protein; ++, more 
than 85% soluble protein. 

© The value is for purified protein in a 10-ml volume of buffer. The proteins 
were purified from cells grown at 18°C in 10 ml of culture. 


terminal one-third of the protein, would not adopt regular 
secondary structure. To investigate the boundaries of the glob- 
ular domain and to optimize conditions for protein expression, 
sample preparation, and NMR structure determination, we 
designed a set of truncated variants of nsp1, bearing in mind 
the results of secondary structure predictions for this protein. 
The variant constructs have molecular masses of 12.7 kDa to 
19.5 kDa, not including the N-terminal tag of 3.5 kDa (Table 
1). These constructs were used to transform E. coli strains 
Rosetta(DE3), BL21(DE3) RIL, and BL21(DE3), and the re- 
combinant proteins were expressed in a microshaker at 37°C, 
27°C, and 18°C. The best growth rates and expression levels 
were obtained with the strain BL21(DE3). Table 1 provides a 
survey of the expression results with six different nsp1 con- 
structs. Most of the protein in the samples expressed at 37°C 
was insoluble for all six variants. For two constructs, higher 
yields of soluble protein were obtained at 27°C, but the best 
results were achieved with expression at 18°C, where most of 
the expressed protein was in the soluble fraction. 

Based on the results in Table 1, BL21(DE3) E. coli cells in 
cultures at 18°C were used for the protein production. The 
proteins were purified by Ni** affinity chromatography and gel 
filtration chromatography. The final protein recovery in 10 ml 
of culture was in the range of 2 to 4 mg, which represents 
expression levels in the range of 11 to 29 uM (Table 1). Sam- 
ples were concentrated for 1D 'H NMR screening with a 
microcoil probe (Almeida et al., unpublished). The two short- 
est constructs, nsp1(1-128) and nsp1(13-128), exhibited the 
highest expression levels. The nsp1(13-128) construct was se- 
lected for the structure determination, based on its high-quality 
'H NMR spectrum and on its greater stability in comparison to 
the other five constructs of Table 1. 

Since cysteine residues are susceptible to oxidation and for- 
mation of intermolecular disulfide bonds, which can lead to 
unstable and heterogeneous protein samples, we also investi- 
gated variant constructs of nsp1(13—128) with Cys 52 replaced 
by Ala, Ser, Arg, or Asp as part of our initial target optimiza- 
tion strategy, using 1D tH NMR and circular dichroism spec- 
troscopy to evaluate their foldedness and stability. The variants 
with Ser 52, Asp 52, or Arg 52 were thus found to be unstable. 
The variant with Cys 52 replaced by Ala led to a stable, folded 
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TABLE 2. Input for the structure calculation and characterization 
of the bundle of 20 energy-minimized CYANA conformers 
representing the NMR structure of nsp1(13-128) 


Parameter Value 
NOE upper distance limits 
(intraresidual, short-range, 
medium-range, long-range)... 2,659 (566, 692, 393, 1008) 
Dihedral angle constraints «0.0.0.0... 100 
Residual target function (A2)....ccceccccssssseee 1.96 + 0.35 
Residual NOE violations 
NO. ZOLA veeeceeseeesneereeesneerneeesneeenneesneesnnees 28 +5 
Miaximuinn (A) es sssisecivsscavesssctsceateuiteriesstereises 0.19 + 0.16 


INGER 225 iss secesiteassaseasteadtssieclsteatecsaiieve 11 
i 4.66 + 1.27 


TOtall vesissssecestsccssssesessessisvesisavaiceavsscaresaaveseecs —3,966.42 + 91.80 
Vander Weaalsisiisteséicddescsierccsvesisastestaveasees —331.44 + 14.88 
IECHROSTALIC 3.isecistsxededjeeeskisenssbeceadtaedes pues’ —4,633.80 + 89.58 


RMSD? from ideal geometry 


Bond lengths (A).....sssssssssessssscccccssssssnssees 0.0075 + 0.0002 


Bond angles: (°) sssisesistessssssececessssivarioncesveeves 2.039 + 0.047 
RMSD to the mean coordinates (A)° 

bb (14-74, 85-125) .oesesscssssesssssesseessessees 0.45 + 0.06 

ha (14-74, 85-125) woecescsssesessessseseeeeeeees 0.91 + 0.07 
Ramachandran plot statistics (%)“ 

Most favored regions .......eeeeeeseesesees 73 

Additional allowed regions «0.0... 22 

Generously allowed regions...........0c0 3 

Disallowed regions .......cceseeeeeeeseeeeeee 2 


“Except for the NOE upper distance limits, dihedral angle constraints, and 
Ramachandran plot statistics, the average values for the 20 energy-minimized 
conformers with the lowest residual CYANA target function values and the 
standard deviations among them are listed. 

> RMSD, root mean square deviation. 

“bb, backbone atoms N, C*, and C’; ha, all heavy atoms. The numbers in 
parentheses indicate the residues for which the RMSD was calculated. 

4 As determined by PROCHECK (45). 


protein. However, after observing excellent sample stability of the 
wild-type protein despite the single Cys residue, we chose the 
wild-type protein for the structure determination. 

The parameters in Table 2 show that a well-defined NMR 
structure of nsp1(13-128) was obtained. Above-average local 
disorder is limited to the C-terminal heptapeptide segment of 
residues 122 to 128 and to a disordered loop of residues 77 to 
86 (Fig. la). The structure of intact nsp1 includes a globular 
domain of residues 13 to 121 and the disordered regions of 
residues 1 to 12 and 122 to 179 (Fig. 1b). 

The structure of nsp1(13-128) represents a new fold. The 
sequential arrangement of the regular secondary structures in 
the globular domain of nsp1 is B1-a1-B2-3,9-B3-B4-B5-B6. 
There is a mixed parallel/antiparallel six-stranded -barrel, 
where the spatial arrangement of the B-strands is B1-2-85- 
83-B4-B6, and 81 makes contact with B6 (Fig. 2 and 3). The 
B-strands consist of residues 15 to 21, 52 to 56, 69 to 73, 87 to 
92, 104 to 110, and 117 to 124. The helix «1 with residues 36 to 
49 is located across one barrel opening, and the 3,,-helix of 
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FIG. 1. (a) Bundle of 20 energy-minimized CYANA conformers of nsp1(13-128). In this stereo view, the polypeptide backbone is shown as a 
gray spline function through the C* positions. Selected sequence positions are identified by numerals. (b) Ribbon representation of the closest 
conformer to the mean coordinates of the bundle of 20 conformers used to represent the NMR structure. The B-strands are cyan, the helices are 
red, and polypeptide segments with nonregular secondary structure are gray. The regular secondary structures are further identified by lettering. 
The polypeptide segments shown in green represent the additional, structurally disordered polypeptide segments of the full-length nsp1. 


residues 62 to 64 is positioned alongside the barrel. A search of 
the Protein Data Bank using the structure of nsp1 as input for 
the DALI server (26) did not indicate statistically significant 
structural similarity to any other protein described to date. 
For the continued discussion it is helpful to adopt a system- 
atic analysis of B-barrels, using the number of strands (7); the 
shear number (S), which measures the stagger of the strands; 
and the tilt angle («) of each strand, which is the angle between 
the barrel axis and the line adjusted for best fit to the N, C*, 
and C’ atoms of each strand (43, 46, 47). S must be an even 
integer because of the hydrogen bonding pattern between the 
B-strands (46). Using standard values for the mean C*-C* 
distance along the strands (a = 3.3 A) and between the strands 
(b = 4.4 A), the following geometric relations characteristic of 
8-barrel structures in proteins have been proposed (43): 


tan a = Sa/nb (1) 
R = [(Sa)? + (nb)’}"/[2n sin (a/n)] (2) 


R is the barrel radius, which is defined as the average of the 
distances between the C“ atoms of the three residues in op- 


posite strands that are closest to the central part of the barrel 
(47). 

The B-barrel in nsp1 contains six strands and has a shear 
number (S) of 10. The measured tilt of the strands to the barrel 
axis (a) ranges from 38° (2) to 78° (64), with an average value 
of 60°. The wide variation among the tilt angles of the indi- 
vidual strands reflects that the B-barrel of nsp1 is pronouncedly 
irregular (Fig. 2a and b). 

The residues used to calculate the radius of the nsp1 barrel 
are 18 to 20, 53 to 55, 71 to 73, 86 to 88, 106 to 108, and 122 
to 124, which give a mean barrel radius (R) of 7 + 1 A. Overall, 
we thus have for nsp1 that the theoretical tilt angle value of 51° 
calculated from equation 1 shows a discrepancy with the ob- 
served average of 60°, whereas the theoretical value of the 
mean barrel radius of 7 A, as calculated from equation 2, is in 
close agreement with the observed value of 7 + 1 A. 

The interior of the nsp1 barrel and the interfaces between 
the two helices and the barrel surface consist primarily of 
hydrophobic residues. The arrangement of the side chains in- 
side the barrel is highly compact, as expected for a barrel of six 
strands, but the inspection of space-filling models suggests that 
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FIG. 2. Two stereo views of the globular domain of nsp1. (a) Ribbon presentation of the closest conformer of nsp1 to the mean coordinates 
of the bundle in Fig. 1a, shown in the same orientation as in Fig. la. The organization of the B-strands in the barrel is indicated by the labels. (b) 
Same as panel a after rotation about a horizontal axis, so that one looks at one side of the B-barrel; the axes of the B-barrel and the helix al are 
nearly perpendicular to each other, and those of the barrel and the 3,9-helix are nearly parallel to each other. 


FIG. 3. Two topology diagrams of the nsp1 mixed parallel/antipa- 
rallel six-stranded B-barrel (see text). The numbering indicates the first 
and last residues of each B-strand. 


there is a tight cavity along the center of the barrel, with a 
radius of about 1.2 A (not shown). The inside of the barrel 
consists of 17 side chains, which are contributed by all six 
strands and which are arranged in three layers. One layer 
contains L105 and the four hydrophilic residues E56, R74, 
K85, and R120. The four peripheral hydrophilic groups medi- 
ate the contacts with the solvent at the barrel opening opposite 
to helix a1 (Fig. 4) (in the orientation of Fig. 2b, these residues 
would be at the bottom of the structure). The charged groups 
of the side chains of these residues are fully solvent exposed, 
and E56 makes a salt bridge with R120. The side chain of V21 
in the second layer and the BCH,-yCH, fragment of R120 are 
located between this first layer and the other side chains of the 
second layer, which is in the narrowest portion of the barrel 
and includes the all-hydrophobic side chains of residues L54, 
172, V87, and V122. A third layer consists of the side chains of 
residues L17, L19, V70, L89, A91, L108, and L124, which make 
hydrophobic contacts with the side chains of residues V36, 
A39, L40, A43, and L47 from the amphipathic helix a1, and 
the side chains of residues C52, F32, P110, and P68. The 
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FIG. 4. Stereo view of nsp1(13-128) in the same orientation as in Fig. 2b. The side chains in the interior of the barrel are differently colored 
to visualize their arrangement in three layers, as discussed in the text. The polypeptide backbone is shown as a gray spline function through the 
C* positions. Amino acid side chains are shown as stick drawings. Color code: red, residues of layer 1 at the barrel opening opposite to helix a1, 
where the four hydrophilic residues are in solvent contact; green, residues in the central layer 2; blue, residues of the third layer, which make 
hydrophobic contacts to the residues shown in magenta at the top, where V36, A39, L40, A43, and L47 originate from the amphipathic helix a1. 


second and third layers of B-strand side chains thus combine 
with the inner side of the helix a1 to form a large hydrophobic 
core (Fig. 4). It is worth noting that the variant proteins with 
Cys 52 replaced by Ser, Asp, or Arg were unstable, which is 
consistent with a disruption of the B-barrel core, as one would 
predict from the NMR structure. 

In the SCOP database (48), there are 13 folds with B-barrels 
of n = 6 and S = 10. Nine of these folds contain antiparallel 
B-barrels, i.e., the QueA-like fold (41), the hypothetical pro- 
tein HI1480 (37), the ferredoxin reductase-like fold (10), the 
phage tail protein (42), the flavin mononucleotide-binding split 
barrel (36), the reductase/isomerase/elongation factor com- 
mon domain (10), the elongation factor/aminomethyltrans- 
ferase common domain (32), the core binding factor B (28), 
and the surface presentation of antigens fold (16). Four folds 
have mixed parallel/antiparallel B-barrels similar to the nsp1 
fold, but none has the topology observed in nsp1 (Fig. 3). 
These include the ribosomal protein L25-like (57), the B and 
8’ subunits of DNA-dependent RNA polymerase (11), the 
double-y B-barrel (38), and the acid protease fold (51). In 
addition to the apparently unique B-strand topology and the 
irregular B-barrel geometry, another interesting feature of 
the nsp1 fold is that the polypeptide chains connecting the 
B-strands run along the side of the barrel, except for the loop 
between 83 and £4 (Fig. 2 and 3). This is a rare feature for 
barrels with n = 6 and S = 10, and besides nsp1, it has been 
observed only between two B-strands in the ribosomal protein 
L25-like fold. 

It is intriguing that none of the aforementioned folds are 
quite as irregular as that of nsp1. The distortion of the nsp1 
structure seems to be related to the polypeptide segments 
connecting the B-strands across the side of the barrel. Inter- 
estingly, although the adjoining ends of strands B5 and B6 are 
the furthest apart in space of all strand combinations in nsp1 
(approximately 15 A between P110 and 1117), they are con- 
nected by the shortest polypeptide segment across the side of 
the barrel (Fig. 2 and 3). This imposes a lower limit on the 
shear between these strands. The shear number of 10 seems to 
be the result of a balance between tight hydrophobic packing 


inside the nsp1 barrel, which is favored by lower shear num- 
bers, and unstrained arrangement of the linker polypeptide 
segments on the outside the barrel, which is favored by larger 
shear numbers. We discuss the B-barrel topology in much 
detail in order to advance the hypothesis that the outstanding 
irregularity of the nsp1 B-barrel might be related to a so-far- 
unknown, possibly entirely novel physiological function of 
nspl. 

The arrangement of the linker polypeptide segments on the 
outside of the barrel is puzzling also with regard to the folding 
pathway of nsp1. For example, if the strand 81 formed hydro- 
gen bonds with 82 early during translation, this would also fix 
the first linker across the barrel, which might limit the ease 
with which B6 could make hydrogen bonds with B4 and B1. 
Schemes representing the topology of the B-barrel (Fig. 3) 
would intuitively suggest that folding starts midway during 
translation with the formation of a B-hairpin of the strands B3 
and £4. In the folded protein, this pair of B-strands forms the 
least distorted part of the B-barrel, with highly regular hydro- 
gen bonds, and the loop between {3 and £4 is the only one that 
does not run along the barrel surface. In subsequent folding 
steps the two-stranded sheets of 82 and 85 and of 81 and £6, 
respectively, might be formed, which also have quite regular 
hydrogen bonding in the nsp1 structure. The linkers between 
B82 and B3 and between 84 and £5 have almost the same 
lengths, which should support to position B5 close to B2 if B4 
is arranged close to B3. The three regular two-stranded 
B-sheets (Fig. 3a) are connected in the barrel by the formation 
of irregular hydrogen bonding patterns. 

Characterization of the full-length SARS-CoV nsp1. The 
full-length nsp1 was characterized by comparison of the num- 
bers of backbone '°N-'H correlation peaks and the HN and 
'N chemical shifts with those of nsp1(13-128) and by hetero- 
nuclear NOE measurements of the truncated and full-length 
nsp1l. The truncated construct nsp1(13-128) has an NMR spec- 
trum with large 'H and !°N chemical shift dispersion (Fig. 5a), 
which is typical for a well-folded globular domain, where the 
atoms of different individual amino acid residues experience 
different local microsusceptibilities due to the nonperiodic na- 
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FIG. 5. (a) 2D *°N,'H heteronuclear single-quantum coherence (HSQC) spectrum of nsp1(13-128). (b) 2D *N,'H HSQC spectrum of 
full-length nsp1(1-179). (c) 2D TROSY-based °N{*H} NOE experiment with full-length nsp1(1-179), with negative peaks shown in red. The 


spectra were recorded at a 'H frequency of 600 MHz at 298 K. 


ture of the interiors of globular proteins. The spectrum of the 
full-length protein, nsp1(1-179) (Fig. 5b), contains a set of 
peaks that overlays very closely with those of nsp1(13-128), 
showing that the globular domain is contained in both con- 
structs. All the additional peaks have HN chemical shifts of 7.9 
to 8.5 ppm, which is the region characteristic of “random-coil” 
polypeptide chains (66). 

A direct measure of intramolecular mobility is provided by 
the heteronuclear 2D !°N{*H} NOE experiment, which is rou- 
tinely used to access protein dynamics on the picosecond to 
nanosecond timescale (55, 69). Positive signals with intensities 
of ~0.8 identify residues in the folded cores of small and 


medium-size globular proteins, with mobility of the individual 
™N-'H moieties restricted to the overall rotational tumbling of 
the molecule. This is illustrated with the *"N{*H} NOE data 
for nsp1(13-128) (Fig. 6), which also serve as a reference for 
assessing the state of the additional chain segments in nsp1(1- 
179). '°N{*H} NOE values of about 0.8 are seen for most of 
the residues in the regular secondary structure elements (Fig. 
6). Increased flexibility of the polypeptide chain that causes 
reduced NOE intensities is found in the disordered loop be- 
tween residues 75 and 87 and in the region of residues 94 to 
103, which forms nonregular secondary structure with one 
y-turn of residues 97 to 99 and a type II B-turn of residues 98 
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FIG. 6. Plot of the *N{'H} NOE intensities versus the sequence of 
nsp1(13-128). The data were collected at a 'H frequency of 600 MHz 
at 298 K. The positions of the regular secondary structure elements are 
indicated. Each point represents the mean of three measurements, and 
the error bars represent the standard deviations of the three measure- 
ments. 


to 101 (Fig. 1). Most of the resonances in full-length nsp1 that 
are not present in nsp1(13-128) have either small positive or 
negative '°"N{'H} NOEs (Fig. 5c), showing that the polypep- 
tide segments of residues 1 to 12 and 129 to 179 are best 
described as a short N-terminal and a long C-terminal flexibly 
disordered tail, respectively (Fig. 1b). Interestingly, it has been 
determined that the carboxy-terminal half of the related pro- 
tein MHV nsp1 is not needed for viral replication in culture 
but is important for efficient proteolytic cleavage between nsp1 
and nsp2 and for optimal viral replication (7). 

Structure-based search for nsp1 functions. In an initial at- 
tempt to identify leads to possible nsp1 functions, Fig. 7a 
identifies the solvent-exposed residues of nsp1, which would be 
sterically accessible for intermolecular contacts with reaction 
partners. These residues give rise to an uneven electrostatic 
surface charge distribution (Fig. 7b), with a large negative 
surface on one face and hydrophobic, polar, and positively 
charged residues forming the opposite surface. The large con- 
tiguous patches of positive and negative surface charge could 
mediate specific as well as nonspecific intermolecular interac- 
tions by electrostatic forces. For example, considering that it 
has been shown that nsp1 promotes mRNA degradation (31), 
the area of positive charge on the molecular surface formed by 
K48, R125, and K126 (Fig. 7b) is of interest as a potential site 
for a direct interaction with mRNA. Alternatively, the posi- 
tively and negatively charged areas of the protein surface might 
be involved in protein-protein interactions, and nsp1 might 
then exert its biological effect not by direct interactions with 
the mRNA but by interacting with other proteins involved in 
the regulation of cellular mRNA stability (49). In this context 
it seems worth mentioning that the NMR structure determi- 
nation of nsp1(13-128) was performed in 250 mM NaCl be- 
cause the protein precipitated at lower ionic strengths, which 
might be due to self-aggregation of nsp1 caused by the uneven 
charge distribution. 

For comparisons with the nsp1 proteins of other coronavi- 
ruses, data are available for the p9 proteins of group 1 CoVs 
and for the p28 proteins of group 2a CoVs. Using database 
searches such as BLAST or PSI-BLAST (2), no significant 
sequence similarity between nsp1 of SARS-CoV and proteins 
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of group 2a coronaviruses could be identified. However, pair- 
wise alignment with the FFAS server (29), employing a profile/ 
profile-based method that is able to detect distant relation- 
ships, identified 20% sequence identity between SARS-CoV 
nspl and MHV nsp!1 (p28) over 174 aligned residues (Fig. 7c). 
The FFAS score of —9.6 indicates that these two proteins 
might share the novel nsp1 three-dimensional fold, but the 
remaining sequence divergence leaves open the possibility that 
these proteins might perform different functions even if they 
had a common fold. To our knowledge, it has not yet been 
determined whether the p28 proteins of group 2a coronavi- 
ruses also promote mRNA degradation, but MHV p28 expres- 
sion has been shown to cause cell cycle arrest in cultured cells. 

The most striking result of the alignment of SARS-CoV 
nsp1 with the polypeptide fragment consisting of residues 46 to 
247 of MHV p28 is the observation of a consensus sequence, 
LRKxGxKG, positioned at the end of strand 86 of the globular 
domain of SARS-CoV nsp1, which is conserved not only in 
MHV p28 (Fig. 7c) but also in human CoV OC43 p28. It 
includes the two residues R125 and K126, which contribute to 
the positively charged patch on the nsp1 molecular surface 
(Fig. 7b). If future studies of the p28 proteins of group 2a CoVs 
should show that these proteins share mRNA degradation 
activity with SARS-CoV nsp1, this conserved region could be a 
candidate for mRNA interaction. 

Analysis of the nsp1 structure also provides indications for 
functional differences between the p28 proteins and SARS- 
CoV nsp1l. For example, the motif K109-R110-L111 in MHV 
p28 was identified by Chen et al. as a potential cyclin-binding 
motif (8), and SARS-CoV nsp1 lacks residues corresponding 
to R110 and L111. In addition, Chen et al. identified residues 
30 to 33 (S/NPER) of p28 as a potential site for phosphoryla- 
tion by cyclin-dependent kinases (8). These residues occur in 
an N-terminal 45-residue segment of p28 that appears not to 
be homologous to SARS-CoV nsp1. The propensity to induce 
cell cycle arrest may therefore be unique to MHV p28, or 
possibly to the group 2a p28 proteins in general, and it might 
not be shared by SARS-CoV nsp1 even if it turned out that 
these proteins all share a similar fold. 

In other comparisons, no significant sequence identity be- 
tween SARS-CoV nsp1 and the nsp1 (p9) proteins of the 
group 1 CoVs could be detected. These results are consistent 
with the analysis by Snijder et al. (59), who described nsp1 as 
a specific marker of group 2 CoVs. The p9 proteins of group 1 
CoVs most likely differ from those of group 2 CoVs in both 
structure and function. 

The MHVI1 p28 protein was subjected to a mutagenesis 
study by Brockway et al. (7), who generated single-amino- 
acid replacements and truncated versions of this protein and 
studied their impact on viral replication in cultured cells. 
Among the mutations found to affect viral replication, only 
some occur in residues conserved between MHV1 p28 and 
SARS-CoV nsp1 (Fig. 7c). Deletion of the entire p28 pro- 
tein or of the polypeptide segment from residue 87 to 164 of 
MHV p28 prevented the virus from productively infecting 
cultured cells (7). If MHV p28 and SARS-CoV nsp1 did 
indeed share a similar fold, the latter construct would lack 
most of the globular domain. In contrast, the carboxy-ter- 
minal half of MHV p28 (residues 124 to 241) has been 
shown to be dispensable for replication in culture, but it is 
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FIG. 7. (a) Amino acid sequence of nsp1, with solvent-exposed residues highlighted in green. A residue is considered to be exposed if at least 
one atom of its side chain has more than 50% surface accessibility to the solvent. For glycines, the CO and HN exposure is considered. (b) Surface 
views of nsp1 in a space-filling representation. In the surface view shown on the left, the structure has the same orientation as in Fig. 2b. Some 
of the surface-exposed side chains discussed in the text are identified with the one-letter amino acid code and the residue number. Color code: gray, 
hydrophobic and polar residues; red, negatively charged; blue, positively charged. (c) Sequence alignment between SARS-CoV nsp1 and MHV 
nsp1 identified with the FFAS server. Identical residues are shown in red. Arrows indicate single-amino-acid replacements in MHV p28 that were 
generated and studied by Brockway et al. (7). Mutations that are detrimental to the viral replication are identified by boldface, while those that 
are not detrimental are in italic. Residues removed in the truncated variant protein MHV1 nsp1AC are shown in lowercase (see text). Residues 
in B-strands and in helical secondary structures are underlined with solid and dashed lines, respectively. 


important for efficient proteolytic cleavage of the protein 
and for optimal viral replication. If MHV p28 were to con- 
tain regular secondary structures similar to those of SARS- 
CoV nsp1, removal of the polypeptide segment from residue 
124 to 241 would correspond to the loss of the strands B3, 
B4, B5, and 86, as well as of the flexibly disordered C- 
terminal tail, which would appear to entail a considerable 
disruption of the protein fold. The following considerations 
might help to resolve the apparent ensuing discrepancies. 
First, the increased flexibility and lack of a globular fold in 
the C-terminal region of the protein may ensure accessibility 
of the protease recognition site between nsp1 and nsp2 but 
may not be directly involved with the activity exerted by the 
protein. Second, it appears that the strands 61 and 62 and 
the helix al might provide for a sufficiently stable fold to 


maintain the so-far-unidentified biological activity, in par- 
ticular if one assumes that the additional N-terminal 45- 
residue segment of MHV p28, which is not homologous to 
SARS-CoV nsp1, could participate in a globular fold and 
help to stabilize the shortened protein. 

In conclusion, this paper shows that the SARS-CoV protein 
nsp1, which is encoded at the 5’ terminus of the genome, forms 
a previously unknown complex B-barrel fold with several unique 
structural features. We hypothesize that the uniqueness of the 
irregular B-barrel fold may be related to a so-far-unknown, 
unique biological function of nsp1. The definition of the globular 
region of nsp1 and the identification of residues on the molecular 
surface likely to contribute to mRNA degradation activity may 
provide a platform for continued research on the role of this 
protein in SARS-CoV and in other coronaviruses. 
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